Create a Quarto file for ALL Lab 2 (no separate files for Parts 1 and 2).
Make sure your final file is carefully formatted, so that each analysis is clear and concise.
Be sure your knitted .html file shows all your source code, including any function definitions.
Part One: Identifying Bad Visualizations
If you happen to be bored and looking for a sensible chuckle, you should check out these Bad Visualisations. Looking through these is also a good exercise in cataloging what makes a visualization good or bad.
Dissecting a Bad Visualization
Below is an example of a less-than-ideal visualization from the collection linked above. It comes to us from data provided for the Wellcome Global Monitor 2018 report by the Gallup World Poll:
While there are certainly issues with this image, do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
This graph shows the percentages of people within countries that believe in the safety of vaccines. This graph has sorted the countries by their global region and determines the ordering of the global region by using the region’s median percentage of people that believe in the safety of vaccines. The graph tries to convince the audience that certain global regions are more likely to believe that vaccines are safe, and also gives us information on individual countries’ percentages.
List the variables that appear to be displayed in this visualization. Hint: Variables refer to columns in the data.
Country: The country in which the survey was conducted.
Region: The global region in which the country is located.
Percentage: The percentage of people in the country that believe in the safety of vaccines.
Now that you’re versed in the grammar of graphics (e.g., ggplot), list the aesthetics used and which variables are mapped to each.
Aesthetics: - the percentage of people who believe that vaccines are safe is mapped to the x-axis. - the global region variable is mapped to the color aesthetic and the facet aesthetic.
What type of graph would you call this? Meaning, what geom would you use to produce this plot?
I would call this a scatterplot and I would use geom_point() to produce this plot.
Provide at least four problems or changes that would improve this graph. Please format your changes as bullet points!
I think the graph would look better and be easier to understand if we rotated it 90º (i.e. using coord_flip)
I feel like a box plot for each global region would be better here, if the main goal is to compare the median percentage of people who believe that vaccines are safe across global regions. Otherwise, I might consider using some kind of heat map that shows the percentage based on varying degrees of color intensity so that a person can visualize on a map people’s opinion of vaccine safety.
I would probably change the color palette, because the current one is not necessarily attractive. I’d make sure the new color palette is color-blind friendly.
I would not use Comic sans; I’d change it to another sans font. Comic sans is not necessarily fitting for this graph nor for scientific observations.
Stacking each global region’s section on top of each other is misleading. To my ey, it appears all the countries of Asia have higher reported percentages of thinking vaccines are safe compared to all countries in America. Again I think the graph would benefit if the percentage variable was on the y-axis instead of the x-axis. This would also allow an easier interpretation of percentages between regions.
The regions’ ordering is meaningless; I’d probably redefine the regions to make them more meaningful (i.e. based on the seven main world regions). I’m not quite sure why former soviet union was chosen to be a world region for this graph.
# create df with country and assigned regioncountry_region <- full_df |>select(WP5, Regions_Report) |>distinct() |>left_join(country_w_codes, by =c("WP5"="code")) |>left_join(regions_codes, by =c("Regions_Report"="code")) |>select(country, region) |># replace republic of congo and palestine to match Crosstab country listmutate(country =case_when(str_detect(country, "Palestinian") ~"Palestine", country =="Republic of Congo"~"Congo, Rep.",TRUE~ country ) )# assign region to plotting data frame with a joinplot_df <- wd |>left_join(country_region, by ="country") |># create new regionsmutate(continent =case_when(str_detect(region, "Asia") ~"Asia",str_detect(region, "America") ~"Americas",str_detect(region, "Europe") ~"Europe",str_detect(region, "Africa") ~"Africa", region =="Middle East"~"Middle East and North Africa", region =="Aus/NZ"~"Oceania",TRUE~"Not Assigned" ) )
Click to expand/collapse
plot_df <- plot_df |># calculate percentage of vaccine agree %s by countrygroup_by(country) |>mutate(percentage =sum(column_n_percent_4, na.rm =TRUE) ) |>ungroup() |># calculate median percentage of vaccine agree %s by regiongroup_by(continent) |>mutate(median_percentage =median(percentage, na.rm =TRUE) ) |>ungroup() |># only keep one row for each country (remove dupes)filter(response !="Somewhat agree") |>select(country, region, percentage, median_percentage, continent) |># ordering of region and countrymutate(country =fct_reorder(country, percentage) )
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
Click to expand/collapse
# custom function to get continent sizesn_fun <-function(x){return(data.frame(y =1.1,label =paste0("n = ", length(x))))}update_geom_defaults("text",list(size =2.7,family ="sans"))plot <- plot_df |>filter(continent !="Not Assigned") |>ggplot(mapping =aes(x = continent,y = percentage,fill = continent)) +geom_boxplot() +labs(title ="Percentage of People Who Believe Vaccines are Safe, by Continent",subtitle ="n = number of countries",x ="",y ="" ) +theme_bw() +theme(text =element_text(family ="sans"),legend.position ="none",plot.title =element_text(hjust =1.13),plot.subtitle =element_text(hjust =-0.67,face ="italic"),panel.grid.minor =element_blank(),panel.grid.major =element_blank(),axis.ticks.y =element_blank() ) +scale_y_continuous(labels = scales::percent_format(scale =100),breaks =seq(0, 1, by =0.25),limits =c(0.24,1.2) ) +scale_fill_brewer(palette =2, type ="qual") +stat_summary(fun.data = n_fun, geom ="text", hjust =0.4) +coord_flip()ggsave( here::here("image", "improved-wellcome-graph.png"),plot = plot,width =6,height =4,dpi =300)plot
For this second plot, you must select a plot that uses maps so you can demonstrate your proficiency with the leaflet package!
Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
I selected chart 2.14, the “Map of interest in knowing more about medicine, disease or health by country” on page 39. This map shows the percentage of people that reported “yes” to the survey question “Would you, personally, like to know more about medicine, disease or health?”. The darker the color of the country, the “more interested” that country’s people are. I think the authors mean to convey countries’ “openness” to learning more about health and medicine and show where this openness is more concentrated on the globe.
List the variables that appear to be displayed in this visualization.
The variables that appear on this map are:
percentage of people that replied yes to this survey question
country name
Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
The percentage variable is mapped to polygon fill color. Country name and their respective percentage is mapped to the polygon labels.
What type of graph would you call this?
Looking up graph types on maps, this is a choropleth graph (https://www.esri.com/arcgis-blog/products/insights/analytics/data-visualization-types).
List all of the problems or things you would improve about this graph.
I would improve the graph’s functionality. I know this is graph on a pdf, but I would want to be able to hover over a country and see the exact percentage and name of the country.
I would also add a legend to the map that shows the percentage of people that answered “yes” to the survey question.
I think I would use a different palette that has two colors on each end of the scale, rather than using different shades of green.
I think there could be more labeling or text on the map itself.
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
map <-leaflet() |>addTiles() # Get world map data from Natural Earthworld <-ne_countries(scale ="medium", # sizereturnclass ="sf") # output object# Merge Poll data with the world map dataworld_data <- world |>left_join(map_df, by =c("adm0_a3"="iso_a3"))# Define color palette based on percentagepal <-colorNumeric(palette ="YlOrRd", # Color palettedomain = world_data$percentage_yes)# Create leaflet map# add a plot labelmap_plot <- world_data |>leaflet() |>addTiles() |>addPolygons(fillColor =~pal(percentage_yes),color ="black", # Border colorweight =1, # Border weightfillOpacity =0.7,highlightOptions =highlightOptions(weight =2,color ="white",fillOpacity =0.7,bringToFront =TRUE ),# Tooltip label# country: %label =ifelse(is.na(world_data$percentage_yes),paste0('No data available.'),paste0(world_data$country,": ", world_data$percentage_yes, "%")), labelOptions =labelOptions(style =list("font-weight"="normal", padding ="3px 8px"),textsize ="15px",direction ="auto" ) ) |># Graph titleaddControl(html ="<div style='font-size: 16px; font-weight: bold; margin: 5px;'> Percentage of People that are interested in health, disease, or medicine</div>\n (said 'Yes' on Question 9 in Gallup Poll 2018)",position ="bottomleft"# Adjust position as needed )# Display the mapmap_plot
Third Data Visualization Improvement
For this third plot, you must use one of the other ggplot2 extension packages mentioned this week (e.g., gganimate, plotly, patchwork, cowplot).
Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
I chose Chart 2.15 on page 40 titled, “Scatterplot exploring interest in science by those who have sought information.” I think this plot is trying to display the correlation between a country’s interest in science and the percentage of people that have sought information about health, disease, or medicine. The authors are trying to show that there is a positive correlation between these two variables. The more people are interested in science, the more likely they are to seek information about health, disease, or medicine.
List the variables that appear to be displayed in this visualization.
The variables used in this plot are region, response to Question 25, and the percentage answered for some response level.
Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
The aesthetics used in this plot are:
the percentage of people that answered “yes” to survey question 8 “Would you, personally, like to know more about science?” is mapped to the x-axis.
the percentage of people that answered “yes” to survey question 7 “Have you, personally, tried to get any information about medicine, disease, or health in the past 30 days?” is mapped to the y-axis.
What type of graph would you call this?
This is a scatterplot.
List all of the problems or things you would improve about this graph.
I would map another variable to the color aesthetic: region.
I would add in a hover feature where you can see the country name and both of its corresponding percentages
I would make a point corresponding to the whole world as a region highlighted on the graph, so that viewers can compare countries’ opinions to the world as a whole.
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.